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The purpose of time series analysis via mechanistic models is to 
reconcile the known or hypothesized structure of a dynamical sys- 
tem with observations collected over time. We develop a framework 
for constructing nonlinear mechanistic models and carrying out infer- 
ence. Our framework permits the consideration of implicit dynamic 
models, meaning statistical models for stochastic dynamical systems 
which are specified by a simulation algorithm to generate sample 
paths. Inference procedures that operate on implicit models are said 
to have the plug-and-play property. Our work builds on recently de- 
veloped plug-and-play inference methodology for partially observed 
Markov models. We introduce a class of implicitly specified Markov 
chains with stochastic transition rates, and we demonstrate its ap- 
plicability to open problems in statistical inference for biological sys- 
tems. As one example, these models are shown to give a fresh per- 
spective on measles transmission dynamics. As a second example, we 
present a mechanistic analysis of cholera incidence data, involving 
interaction between two competing strains of the pathogen Vibrio 
cholerae. 

1. Introduction. A dynamical system is a process whose state varies with 
time. A mechanistic approach to understanding such a system is to write 
down equations, based on scientific understanding of the system, which de- 
scribe how it evolves with time. Further equations describe the relationship 
of the state of the system to available observations on it. Mechanistic time 
series analysis concerns drawing inferences from the available data about 
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the hypothesized equations [Brillinger (2008)]. Questions of general interest 
include the following. Are the data consistent with a particular model? If so, 
for what range of values of model parameters? Does one mechanistic model 
describe the data better than another? 

The defining principle of mechanistic modeling is that the model structure 
should be chosen based on scientific considerations, rather than statistical 
convenience. Although linear Gaussian models give an adequate represen- 
tation of some processes [Durbin and Koopman (2001)], nonlinear behavior 
is an essential property of many systems. This leads to a need for statisti- 
cal modeling and inference techniques applicable to rather general classes 
of processes. In the absence of alternative statistical methodology, a com- 
mon approach to mechanistic investigations is to compare data, qualita- 
tively or via some ad-hoc metric, with simulations from the model. It is 
a challenging problem, of broad scientific interest, to increase the range of 
mechanistic time series models for which formal statistical inferences, mak- 
ing efficient use of the data, can be made. Here, we develop a framework 
in which simulation of sample paths is employed as the basis for likelihood- 
based inference. Inferential techniques that require only simulation from the 
model (i.e., for which the model could be replaced by a black box which 
inputs parameters and outputs sample paths) have been called "equation 
free" [Kevrekidis, Gear and Hummer (2004), Xiu, Kevrekidis and Ghanem 
(2005)]. We will use the more descriptive expression "plug and play." 

Plug-and-play inference techniques can be applied to any time series 
model for which a numerical procedure to generate sample paths is avail- 
able. We call such models implicit, meaning that closed-form expressions for 
transition probabilities or sample paths are not required. The goal of this 
paper is to develop plug-and-play inference for a general class of implicitly 
specified stochastic dynamic models, and to show how this capability enables 
new and improved statistical analyses addressing current scientific debates. 
In other words, we introduce and demonstrate a framework for time series 
analysis via mechanistic models. 

Here, we concern ourselves with partially observed, continuous-time, non- 
linear, Markovian stochastic dynamical systems. The particular combina- 
tion of properties listed above is chosen because it arises naturally when 
constructing a mechanistic model. Although observations will typically be 
at discrete times, mechanistic equations describing underlying continuous 
time systems are most naturally described in continuous time. If all quanti- 
ties important for the evolution of the system are explicitly modeled, then 
the future evolution of the system depends on the past only through the 
current state, that is, the system is Markovian. A stochastic model is pre- 
requisite for mechanistic time series analysis, since chance variability is re- 
quired to explain the difference between the data and the solution to noise- 
free deterministic equations. Statistical analysis is simpler if stochasticity 
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can be confined to the observation process (the statistical problem becomes 
nonlinear regression) or if the stochastic dynamical system is perfectly ob- 
served [Basawa and Prakasa Rao (1980)]. Here we address the general case 
with both forms of stochasticity. Despite considerable work on such mod- 
els [Anderson and Moore (1979), Liu (2001), Doucet, de Freitas and Gordon 
(2001), Cappe, Moulines and Ryden (2005)], statistical methodology which 
is readily applicable for a wide range of models has remained elusive. For 
example, Markov chain Monte Carlo and Monte Carlo Expectation-Maximi- 
zation algorithms [Cappe, Moulines and Ryden (2005)] have technical dif- 
ficulties handling continuous time dynamic models [Beskos et al. (2006)]; 
these two approaches also lack the plug-and-play property. 

Several inference techniques have previously been proposed which are 
compatible with plug-and-play inference from partially observed Markov 
processes. Nonlinear forecasting [Kendall et al. (1999)] is a method of simu- 
lated moments which approximates the likelihood. Iterated filtering is a re- 
cently developed method [Ionides, Breto and King (2006)] which provides a 
way to calculate a maximum likelihood estimate via sequential Monte Carlo, 
a plug and play filtering technique. Approximate Bayesian sequential Monte 
Carlo plug-and-play methodologies [Liu and West (2001), Toni et al. (2008)] 
have also been proposed. 

In Section 2 we introduce a new and general class of implicitly spec- 
ified models. Section 3 is concerned with inference methodology and in- 
cludes a review of the iterated filtering approach of Ionides, Breto and King 
(2006). Section 4 discusses the role of our modeling and inference frame- 
work for the analysis of biological systems. Two concrete examples are de- 
veloped, investigating measles (Section 4.1) and cholera (Section 4.2). Sec- 
tion 5 is a concluding discussion. The motivating examples in this paper 
have led to an emphasis on modeling infectious diseases. However, the issue 
of mechanistic modeling of time series data occurs in many other contexts. 
Indeed, it is too widespread to give a comprehensive review and we in- 
stead list some examples: molecular biochemistry [Kou, Xie and Liu (2005)]; 
wildlife ecology [Newman and Lindley (2006)]; cell biology [Ionides et al. 
(2004)]; economics [Fernandez- Villaverde and Rubio- Ramirez (2005)]; sig- 
nal processing [Arulampalam et al. (2002)]; data assimilation for numeri- 
cal models [Houtekamer and Mitchell (2001)]. The study of infectious dis- 
ease, however, has a long history of motivating new modeling and data 
analysis methodology [Kermack and McKendrick (1927), Bartlett (1960), 
Anderson and May (1991), Finkenstadt and Grenfell (2000), 
Ionides, Breto and King (2006), King et al. (2008b)]. The freedom to carry 
out formal statistical analysis based on mechanistically motivated, nonlin- 
ear, nonstationary, continuous time stochastic models is a new development 
which promises to be a useful tool for a variety of applications. 
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2. Compartment models with stochastic rates. Many mechanistic mod- 
els can be viewed in terms of flows between compartments [Jacquez (1996), 
Matis and Kiffe (2000)]. Here, we introduce a class of implicitly specified 
stochastic compartment models; widespread biological applications of these 
models will be discussed in Section 4, with broader relevance and further 
generalizations discussed in Section 5. The reader may choose initially to 
pass superficially through the technical details of this section. We present a 
general model framework which is, at once, an example of an implicitly spec- 
ified mechanistic model, a necessary prelude to our following data analyses, 
and a novel class of Markov processes requiring some formal mathematical 
treatment. 

A general compartment model is a vector- valued process X(t) = {X\ (t), . . . , 
X c (t)) denoting the (integer or real- valued) counts in each of c compart- 
ments. The basic characteristic of a compartment model is that X(t) can 
be written in terms of the flows Nij(t) from i to j, via a "conservation of 
mass" identity: 

(1) X i (t)=X i (0)+J2N ji (t) 

Each flow Nij is associated with a rate function = /j,ij(t,X(t)). There 
are many ways to develop concrete interpretations of such a compartment 
model. For the remainder of this section, we take Xi(t) to be nonnegative 
integer- valued, so X(t) models a population divided into c disjoint categories 
and Hij is the rate at which each individual in compartment i moves to j. 
In this context, it is natural to require that {Nij(t), 1 < i < c, 1 < j < c} is 
a collection of nondecreasing integer-valued stochastic processes satisfying 
the constraint Xi(t) > for all i and t. The conservation equation (1) makes 
the compartment model closed in the sense that individuals cannot enter 
or leave the population. However, processes such as immigration, birth or 
death can be modeled via the introduction of additional source and sink 
compartments. 

We wish to introduce white noise to model stochastic variation in the 
rates (discussion of this decision is postponed to Sections 4 and 5). We 
refer to white noise as the derivative of an integrated noise process with 
stationary independent increments [Karlin and Taylor (1981)]. The integral 
of a white noise process over an interval is thus well defined, even when 
the sample paths of the integrated noise process are not formally differen- 
tiable. Specifically, we introduce a collection of integrated noise processes 
{Tij(t), 1 < i < c, 1 < j < c} with the properties: 

(PI) Independent increments: The collection of increments {Tijfo) — Tij(t\), 
1 < i < c, 1 < j < c} is presumed to be independent of {Tij(ti) — T^fe), 
1 < i < c, 1 < j < c} for all t\ < t 2 < t 3 < t 4 . 
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1. 


Divide the interval [0, T] into N intervals of width 5 = 


T/N 


2. 


Set initial value X(0) 




3. 


FOR n = to N - 1 




4. 


Generate noise increments {Ar^- = Yij(n8 + 5) — Tij 


(nS)} 


5. 


Generate process increments 
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where pij = pij{{^ij{n5,X{n5))},{ATij}) is given in 


(3) 


6. 


Set Xi(n6 + 5)=R l + £, ^ Aify 




7. END FOR 





Fig. 1. Euler scheme for a numerical solution of the Markov chain specified by (2). In 
steps 5 and 6, Ri counts the individuals who remain in compartment i during the current 
Euler increment. 



(P2) Stationary increments: The collection of increments {Tijfo) — Tij(ti), 
1 < i < c, 1 < j < c} has a joint distribution depending only on £2 — 1\. 
(P3) Nonnegative increments: Tijfo) — r^(ii) > for ti > t\. 

We have not assumed that different integrated noise processes and T^i 
are independent; their increments could be correlated, or even equal. These 
integrated noise processes define a collection of noise processes given by 
£,ij(t) = jjjTijit). Since Tij(t) is increasing, £ij(t) is nonnegative and /J>ij£ij(t) 
can be interpreted as a rate with multiplicative white noise. In this context, 
it is natural to assume the following: 

(P4) Unbiased multiplicative noise: E\Tij(t)] =t. 

At times, we may further assume one or more of the following properties: 

(P5) Partially independent noises: For each i, {Tij(t)} is independent of 

{T ik (t)} for aHj^k. 
(P6) Independent noises: {Tij(t)} is independent of {Tki(t)} for all pairs 

(i,j)^(k,l). 

(P7) Gamma noises: Marginally r^- (t + 5) — (t) ~ Gamma(5 / afj , afj ) , the 
gamma distribution whose shape parameter is 6/afj and scale param- 
eter is (Tij, with corresponding mean 5 and variance Safj. We call afj 
an infinitesimal variance parameter [Karlin and Taylor (1981)]. 

The choice of gamma noise in (P7) gives a convenient concrete example. A 
wide range of Levy processes [Sato (1999)] could be alternatively employed. 

We proceed to construct a compartment model as a continuous time 
Markov chain via the limit of coupled discrete-time multinomial processes 
with random rates. Similar Euler multinomial schemes (without noise in 
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the rate function) are a standard numerical approach for studying popula- 
tion dynamics [Cai and Xu (2007)]. The representation of our model given 
in (2) is implicit since numerical solution is available to arbitrary precision 
via evaluating the coupled multinomial processes in a discrete time-step 
Euler scheme (described in Figure 1). Let AiVy = Nij(t + 6) — Nij(t) and 
Arjj = Tij (t + 5) — Tij(t). We suppose that 

P[ANij = ny, for all 1 < i < c, 1 < j < c, i 7^ j \ X(t) = (x±, . . . , x c )] 
(2) =E 



+ o(5), 

where rj = Xi — J2k^i n ik, ( ni . n . n ) is a multinomial coefficient and 
Pij =Pij({fii j (t,x)},{AT ij (t)}) 

(3) 

= ^1 -expi^-^fiikATif^JiXijATij^ ^n ik AY ik , 

with = fiij(t,x). Theorem A.l, which is stated in Appendix A and proved 
in a supplement to this article [Breto et al. (2009)], shows that (2) defines 
a continuous time Markov chain when the conditions (P1)-(P5) hold. A 
finite-state continuous time Markov chain is specified by its infinitesimal 
transition probabilities [Bremaud (1999)], which are in turn specified by 
(2). Theorem A. 2, also stated in Appendix A and proved in the supplement 
[Breto et al. (2009)], determines the infinitesimal transition probabilities re- 
sulting from (2) supposing the conditions (P1)-(P7). When the infinitesimal 
transition probabilities can be calculated exactly, exact simulation methods 
are available [Gillespie (1977)]. In practice, numerical schemes based on Eu- 
ler approximations may be preferable — Euler schemes for Markov chain com- 
partment models have been proposed based on Poisson [Gillespie (2001)], 
binomial [Tian and Burrage (2004)] and multinomial [Cai and Xu (2007)] 
approximations. Our choice of a model for which convenient numerical so- 
lutions are available (e.g., via the procedure in Figure 1) comes at the ex- 
pense of difficulty in computing analytic properties of the implicitly-defined 
continuous-time process. However, since the properties of the model will be 
investigated by simulation, via a plug-and-play methodology, the analytic 
properties of the continuous-time process are of relatively little interest. 

For the gamma noise in (P7), the special case where o~ij = is taken to 
correspond to = 1- If o~ij = for all i and j, then (2) becomes the Pois- 
son system widely used to model demographic stochasticity in population 
models [Bremaud (1999), Bartlett (I960)]. We therefore call a process de- 
fined by (2) a Poisson system with stochastic rates. Constructions similar to 
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Theorem A.l are standard for Poisson systems [Bremaud (1999)], but here 
care is required to deal with the novel inclusion of white noise in the rate 
process. Our formulation for adding noise to Poisson systems can be seen 
as a generalization of subordinated Levy processes [Sato (1999)], though we 
are not aware of previous work on the more general Markov processes con- 
structed here. It is only the recent development of plug-and-play inference 
methodology that has led to the need for flexible Markov chain models with 
random rates. 

2.1. Comments on the role of numerical solutions based on discretizations. 
In this section we have proposed employing a discrete-time approximation 
to a continuous-time stochastic process. Numerical solutions based on dis- 
cretizations of space and time are ubiquitous in the applied mathematical 
sciences and engineering. A standard technique is to investigate whether 
further reduction in the size of the discretization substantially affects the 
conclusions of the analysis. When sufficiently fine discretization is not com- 
putationally feasible, the numerical solutions may still have some value. Cli- 
mate modeling and numerical weather prediction are examples of this: such 
systems have important dynamic behavior at scales finer than any feasible 
discretization, but numerical models nevertheless have a scientific role to 
play [Solomon et al. (2007)]. 

When numerical modeling is used as a scientific tool, conclusions about 
the limiting continuous time model will be claimed based on properties of 
the model that are determined by simulation of realizations from the dis- 
cretized model. Such conclusions depend on the assumption that properties 
of the numerical solution which are stable as the numerical approximation 
timestep, 5, approaches should indeed be properties of the limiting con- 
tinuous time process. This need not always be true, which is one reason why 
analytic properties, such as Theorems A.l and A. 2, are valuable. 

From another point of view, an argument for being content with a nu- 
merical approximation to (2) for sufficiently small 5 is that there may be 
no scientific reason to prefer a true continuous time model over a fine dis- 
cretization. For example, when modeling year-to-year population dynamics, 
continuous time models of adequate simplicity for data analysis typically 
will not include diurnal effects. Thus, there is no particular reason to think 
the continuous time model more credible than a discrete time model with a 
step of one day. One can think of a set of equations defining a continuous 
time process, combined with a specified discretization, as a way of writing 
down a discrete time model, rather than treating the continuous time model 
as a gold standard against which all discretizations must be judged. 

3. Plug-and-play inference methodology. We suppose now that the dy- 
namical system depends on some unknown parameter vector 9 G M. de , so that 
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fJ-ij = l^ij(t, x, 9) and erjj = aij(t, x, 9). Inference on 9 is to be made based on 
observations yi-.N = (yi, ■ ■ ■ ,Un) made at times tuN = • • • , *at), with y n € 
M. dy . Conditionally on X(t±), . . . ,X(tj^), we suppose that the observations 
are drawn independently from a density g(y n \X(t n ),9). Likelihood-based in- 
ference can be carried out for the framework of Section 2 using the iterated 
filtering methodology proposed by Ionides, Breto and King (2006), imple- 
mented as described in Figure 2. Iterated filtering is a technique to maximize 
the likelihood for a partially observed Markov model, permitting calcula- 
tion of maximum likelihood point estimates, confidence intervals (via profile 
likelihood, bootstrap or Fisher information) and likelihood ratio hypothesis 
tests. Iterated filtering has been developed in response to challenges aris- 
ing in ecological and epidemiological data analysis [Ionides, Breto and King 
(2006, 2008), King et al. (2008b)], and appears here for the first time in 
the statistical literature. We refer to Ionides, Breto and King (2006) for the 
mathematical results concerning the iterated filtering algorithm in Figure 2. 
We proceed to review the methodology and its heuristic motivation, to dis- 
cuss implementation issues, and to place iterated filtering in the context of 
alternative statistical methodologies. 

For nonlinear non-Gaussian partially observed Markov models, the likeli- 
hood function can typically be evaluated only inexactly and at considerable 
computational expense. The iterated filtering procedure takes advantage 
of the partially observed Markov structure to enable computationally effi- 
cient maximization. A useful property of partially observed Markov mod- 
els is that, if the parameter 6 is replaced by a random walk #i : jv> with 
E[8q] = 9 and E[9 n \9 n ^\\ = 8 n -\ for n > 1, the calculation of 9 n = E[9 n \yi :n ] 
and V n = Var(0 n |yi :n _i) is a well-studied and computationally convenient fil- 
tering problem [Kitagawa (1998), Liu and West (2001)]. Additional stochas- 
ticity of this kind is introduced in steps 4 and 12 of Figure 2. This leads 
to time- varying parameter estimates, so 9i(t n ) in Figure 2 is an estimate 
of 9i depending primarily on the data at and shortly before time t n . The 
updating rule in step 16, giving an appropriate way to combine these tem- 
porally local estimates, is the main innovative component of the procedure. 
Ionides, Breto and King (2006) showed that this algorithm converges to the 
maximum of the likelihood function, under sufficient regularity conditions 
to justify a Taylor series expansion argument. Only the mean and variance 
of the stochasticity added in steps 4 and 12 play a role in the limit as n 
increases. The specific choice of the normal distribution for steps 4 and 12 
of Figure 2 is therefore unimportant, but does require that the parameter 
space is unbounded. This is achieved by reparameterizing where necessary; 
we use a log transform for positive parameters and a logit transform for 
parameters lying in the interval [0, 1]. 

Steps 2, 11 and 17 of Figure 2 concern the initial values of the state vari- 
ables. For stationary processes, one can think of these as unobserved random 
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MODEL INPUT: f (■), g (■{■), Vl , . . . , y N , t , . . . , t N 

ALGORITHMIC PARAMETERS: integers J, L, Af; 
scalars < a < 1, 6>0; vectors xj 1 ^, 0^; 
positive definite symmetric matrices £/, Eg. 

1. FOR to = 1 to M 

2. X/(t ,i) ~iV[xj m) ,a m_1 S/] ; j = l,...,J 

3. X F (t J)=X I (t ,j) 

4. ^(to^V^M.bo^Se] 

5. 0(i o ) = # (m) 

(3. FOR n = 1 to X 

7. Ap(t n ,j) = /(X F (t n _ 1 ,i),t n _ 1 ,t n ^(t n „ 1 ,j),VF) 

8. = #(y n |X P (i n ,j),t n ,6>(t n _i, j)) 

9. draw ki,...,kj such that 
Prob(/cj = z) = w(n, i) / J2i w(n, £) 

10. X F (t n ,j) = X P (t n ,k j ) 

11. Xl(tn,j)=*l(tn-l,fc,0 

12. 0(i nL j) - X[0(i n _i, fcjO.o""" 1 ^ " 

13. Set 0i(t n ) to be the sample mean of {#j(i n _i, kj),j = 1, . . . , J} 

14. Set Vi(t n ) to be the sample variance of {0i(t n ,j),j = 1, . . . , J} 

15. END FOR 

16. e\ m+1) = <?H + F i (t 1 )EliV- 1 (t re )(0 i (* n ) -^(tn-i)) 

17. Set xj m+1 ^ to be the sample mean of {Xi(tL,j),j = 1, . . . , J} 

18. END FOR 

RETURN 

maximum likelihood estimate for parameters, = 
maximum likelihood estimate for initial values, X(to) =x\ M+l ^ 
maximized conditional log likelihood estimates, £ n (9) =log(%2jW(n,j)/J) 
maximized log likelihood estimate, £{9) = ^2 n £n{9) 



Fig. 2. Implementation of likelihood maximization by iterated filtering. N[fj,, E] corre- 
sponds to a normal random variable with mean vector fj, and covariance matrix E; X(t n ) 
takes values in R dx ; y n takes values in R d " ; takes values in K. ds and has components 
{9i,i = 1, . ..,d$}; /(■) is the transition rule described in (4); g(-\-) is the measurement 
density for the observations yi.N- 



variables drawn from the stationary distribution. However, for nonstation- 
ary processes (such as those considered in Sections 4.1 and 4.2, and any 
process modeled conditional on measured covariates) these initial values are 
treated as unknown parameters. These parameters require special attention, 
despite not usually being quantities of primary scientific interest, since the 
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information about them is concentrated at the beginning of the time series, 
whereas the computational benefit of iterated filtering arises from combin- 
ing information accrued through time. Steps 2, 11 and 17 implement a fixed 
lag smoother [Anderson and Moore (1979)] to iteratively update the initial 
value estimates. The value of the fixed lag (denoted by L in Figure 2) should 
be chosen so that there is negligible additional information about the initial 
values after time t^. Choosing L too large results in slower convergence, 
choosing L too small results in bias. 

Iterated filtering, characterized by the updating rule in step 16 of Fig- 
ure 2, can be implemented via any filtering method. The procedure in 
Figure 2 employs a basic sequential Monte Carlo filter which we found 
to be adequate for the examples in Section 4 and also for previous data 
analyses [Ionides, Breto and King (2006), King et al. (2008b)]. Many 
extensions and generalizations of sequential Monte Carlo have been pro- 
posed [Arulampalam et al. (2002), Doucet, de Freitas and Gordon (2001), 
Del Moral, Doucet and Jasra (2006)] and could be employed in an iterated 
filtering algorithm. If the filtering technique is plug-and-play, then likeli- 
hood maximization by iterated filtering also has this property. Basic sequen- 
tial Monte Carlo filtering techniques do have the plug-and-play property, 
since only simulations from the transition density of the dynamical system 
are required and not evaluation of the density itself. Although sequential 
Monte Carlo algorithms are usually written in terms of transition densities 
[Arulampalam et al. (2002), Doucet, de Freitas and Gordon (2001)], we em- 
phasize the plug-and-play property of the procedure in Figure 2 by specifying 
a Markov process at a sequence of times to < t\ < ■ • ■ < tjy via a recursive 
transition rule, 

(4) X(t n ) = /(A(t n _x), t„_ 1; t n , 9, W). 

Here, it is understood that W is some random variable which is drawn 
independently each time /(•) is evaluated. In the context of the plug-and- 
play philosophy, /(•) is the algorithm to generate a simulated sample path 
of X(t) at the discrete times ti,. .. , t/v given an initial value X(to). 

To check whether global maximization has been achieved, one can and 
should consider various different starting values [i.e., 9^ and Xj 1 ^ in Fig- 
ure 2] . Attainment of a local maximum can be checked by investigation of the 
likelihood surface local to an estimate 9. Such an investigation can also give 
rise to standard errors, and we describe here how this was carried out for the 
results in Table 2. We write £(9) for the log likelihood function, and we call 
a graph of 1(9 + z5i) against 9\ + z a sliced likelihood plot; here, 5i is a vector 
of zeros with a one in the ith position, and 9 has components {9±, . . . ,9d e }. 
If 9 is located at a local maximum of each sliced likelihood, then 9 is a local 
maximum of £(9), supposing £(9) is continuously differentiable. We check 
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this by evaluating i(Q + ZijSi) for a collection {zij} defining a neighborhood 
of 9. The likelihood is evaluated with Monte Carlo error, as described in 
Figure 2, with £/ = 0, = and M = 1. Therefore, it is necessary to make 
a smooth approximation to the sliced likelihood [Ionides (2005)] based on 
the available evaluations. The size of the neighborhood (specified by {zij}) 
and the size of the Monte Carlo sample (specified by J in Figure 2) should 
be large enough that the local maximum for each slice is clearly identified. 
Computing sliced likelihoods requires moderate computational effort, linear 
in the dimension of 6. As a by-product of the sliced likelihood computation, 
one has access to the conditional log likelihood values, defined in Figure 2 
and written here as £ n ij = l n (Q + z^Si). Regressing £ n ij on for each fixed 
i and n gives rise to estimates £ n i for the partial derivatives of the con- 
ditional log likelihoods. Standard errors of parameters are found from the 
estimated observed Fisher information matrix [Barndorff-Nielsen and Cox 
(1994)], with entries given by = J2 n ^ndnk- We prefer profile likelihood 
calculations, such as Figure 7, to derive confidence intervals for quantities 
of particular interest. However, standard errors derived from estimating the 
observed Fisher information involve substantially less computation. 

Parameter estimation for partially observed nonlinear Markov processes 
has long been a challenging problem, and it is premature to expect a fully 
automated statistical procedure. The implementation of iterated filtering 
in Figure 2 employs algorithmic parameters which require some trial and 
error to select. However, once the likelihood has been demonstrated to be 
successfully maximized, the algorithmic parameters play no role in the sci- 
entific interpretation of the results. 

Other plug-and-play inference methodologies applicable to the models of 
Section 2 have been developed. Nonlinear forecasting [Kendall et al. (1999)] 
has neither the statistical efficiency of a likelihood-based method nor the 
computational efficiency of a filtering-based method. The Bayesian sequen- 
tial Monte Carlo approximation of Liu and West (2001) combines likelihood- 
based inference with a filtering algorithm, but is not supported by theoret- 
ical guarantees comparable to those presented by Ionides, Breto and King 
(2006) for iterated filtering. A recently developed plug-and-play approach 
to approximate Bayesian inference [Sisson, Fan and Tanaka (2007)] has been 
applied to partially observed Markov processes [Toni et al. (2008)]. Other re- 
cent developments in Bayesian methodology for partially observed Markov 
processes include Newman et al. (2008), Cauchemez and Ferguson (2008), 
Cauchemez et al. (2008), Beskos et al. (2006), Poison, Stroud and Muller 
(2008), Boys, Wilkinson and Kirkwood (2008). This research has been mo- 
tivated by the inapplicability of general Bayesian software, such as WinBUGS 
[Lunn et al. (2000)], for many practical inference situations [Newman et al. 
(2008)]. 
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A numerical implementation of iterated filtering is available via the soft- 
ware package pomp [King, Ionides and Breto (2008a)] which operates in the 
free, open-source, R computing environment [R Development Core Team 
(2006)]. This package contains a tutorial vignette as well as further ex- 
amples of mechanistic time series models. The data analyses of Section 4 
were carried out using pomp, in which the algorithms in Figures 1 and 2 are 
implemented via the functions reulermultinom and mif respectively. 

4. Time series analysis for biological systems. Mathematical models for 
the temporal dynamics of biological populations have long played a role 
in understanding fluctuations in population abundance and interactions 
between species [Bjornstad and Grenfell (2001), May (2004)]. When using 
models to examine the strength of evidence concerning rival hypotheses 
about a system, a model is typically required to capture not just the qualita- 
tive features of the dynamics but also to explain quantitatively all the avail- 
able observations on the system. A critical aspect of capturing the statistical 
behavior of data is an adequate representation of stochastic variation, which 
is a ubiquitous component of biological systems. Stochasticity can also play 
an important role in the qualitative dynamic behavior of biological systems 
[Coulson, Rohani and Pascual (2004), Alonso, McKane and Pascual (2007)]. 
Unpredictable event times of births, deaths and interactions between in- 
dividuals result in random variability known as demographic stochasticity 
(from a microbiological perspective, the individuals in question might be 
cells or large organic molecules). The environmental conditions in which 
the system operates will fluctuate considerably in all but the best experi- 
mentally controlled situations, resulting in environmental stochasticity. The 
framework of Section 2 provides a general way to build the phenomenon of 
environmental stochasticity into continuous-time population models, via the 
inclusion of variability in the rates at which population processes occur. To 
our knowledge, this is the first general framework for continuous time, dis- 
crete population dynamics which allows for both demographic stochasticity 
[infinitesimal variance equal to the infinitesimal mean; see Karlin and Taylor 
(1981) and Appendix B] and environmental stochasticity [infinitesimal vari- 
ance greater than the infinitesimal mean; see supporting online material 
Breto et al. (2009)]. 

From the point of view of statistical analysis, environmental stochasticity 
plays a comparable role for dynamic population models to the role played 
by over-dispersion in generalized linear models. Models which do not per- 
mit consideration of environmental stochasticity lead to strong assumptions 
about the levels of stochasticity in the system. This relationship is discussed 
further in Section 5. For generalized linear models, over-dispersion is com- 
monplace, and failure to account properly for it can give rise to misleading 
conclusions [McCullagh and Nelder (1989)]. Phrased another way, including 
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sufficient stochasticity in a model to match the unpredictability of the data 
is essential if the model is to be used for forecasting, or predicting a quan- 
titative range of likely effects of an intervention, or estimating unobserved 
components of the system. 

We present two examples. First, Section 4.1 demonstrates the role of en- 
vironmental stochasticity in measles transmission dynamics, an extensively 
studied and relatively simple biological system. Second, Section 4.2 analyzes 
data on competing strains of cholera to demonstrate the modeling frame- 
work of Section 2 and the inference methodology of Section 3 on a more 
complex system. 

4.1. Environmental stochasticity in measles epidemics. The challenges 
of moving from mathematical models, which provide some insight into the 
system dynamics, to statistical models, which both capture the mechanistic 
basis of the system and statistically describe the data, are well 
documented by a sequence of work on the dynamics of measles epidemics 
[Bartlett (1960), Anderson and May (1991), Finkenstadt and Grenfell (2000), 
Bjornstad, Finkenstadt and Grenfell (2002), Morton and Finkenstadt (2005), 
Cauchemez and Ferguson (2008)]. Measles is no longer a major developed 
world health issue but still causes substantial morbidity and mortality, par- 
ticularly in sub-Saharan Africa [Grais et al. (2006), Conlan and Grenfell 

(2007) ]. The availability of excellent data before the introduction of 
widespread vaccination has made measles a model epidemic system. Re- 
cent attempts to analyze population-level time series data on measles epi- 
demics via mechanistic dynamic models have, through statistical expedi- 
ency, been compelled to use a discrete-time dynamic model using timesteps 
synchronous with the reporting intervals [Finkenstadt and Grenfell (2000), 
Bjornstad, Finkenstadt and Grenfell (2002), Morton and Finkenstadt (2005)] 
Such discrete time models risk incorporating undesired artifacts [Glass, 
Xia and Grenfell (2003)]. The first likelihood-based analysis via continu- 
ous time mechanistic models, incorporating only demographic stochasticity, 
was published while this paper was under review [Cauchemez and Ferguson 

(2008) ]. From another perspective, the properties of stochastic dynamic epi- 
demic models have been studied extensively in the context of continuous 
time models with only demographic stochasticity [Bauch and Earn (2003), 
Dushoff et al. (2004), Wearing, Rohani and Keeling (2005)]. We go beyond 
previous approaches, by demonstrating the possibility of carrying out mod- 
eling and data analysis via continuous time mechanistic models with both 
demographic and environmental stochasticity. For comparison with the work 
of Cauchemez and Ferguson (2008), we analyze measles epidemics occurring 
in London, England during the pre- vaccination era. The data, reported cases 
from 1948 to 1964, are shown in Figure 3. 
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Fig. 3. Biweekly recorded measles cases (solid line) and birth rate (dotted line, estimated 
by smooth interpolation of annual birth statistics) for London, England. 



Measles has a relatively simple natural history, being a highly infectious 
disease with lifelong immunity following infection. As such, it was histori- 
cally a childhood disease, with transmission occurring primarily in school 
environments. A basic model for measles has children becoming suscepti- 
ble to infection upon reaching school age [here taken to be r = 4 year, 
following Cauchemez and Ferguson (2008)]. This susceptible group is de- 
scribed by a compartment S containing, at time t, a number of individuals 
S(t). Upon exposure to infection, a transition occurs to compartment E in 
which individuals are infected but not yet infectious. The disease then pro- 
gresses to an infectious state /. Individuals are finally removed to a state R, 
due to bed-rest and subsequent recovery. This sequence of compartments is 
displayed graphically in Figure 4. We proceed to represent this system as 
a Markov chain with stochastic rates, via the notation of Section 2, with 
X(t) = (S(t),E(t),I(t),R(t),B(t),D(t)). Compartments B and D are intro- 
duced for demographic considerations: births are represented by transitions 
from B to S, with an appropriate delay r, and deaths by transitions into D. 

The seasonality of the transmissibility has been found to be well de- 
scribed by whether or not children are in school [Fine and Clarkson (1982), 



Fig. 4. Flow diagram for measles. Each individual host falls in one compartment: S, 
susceptible; E, exposed and infected but not yet infectious; I , infectious; R, removed and 
subsequently recovered and immune. Births enter S after a delay r, and all individuals 
have a mortality rate m. 
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Bauch and Earn (2003)]. Thus, we define a transmissibility function (3(t) by 



where d(t) is the integer- valued day (1-365) corresponding to the real- valued 
time t measured in years. This functional form allows reduced transmission 
during the Christmas vacation (days 356-365 and 1-6), Easter vacation 
(100-115), Summer vacation (200-251) and Autumn half-term (300-307). 
The rate of new infections is given the form 



Here, u describes infection from measles cases outside the population under 
study; a describes inhomogeneous mixing [Finkenstadt and Grenfell (2000)]; 
P(t) is the total population size, which is treated as a known covariate via 
interpolation from census data. Environmental stochasticity on transmission 
is included via a gamma noise process £se(t) with infinitesimal variance pa- 
rameter <Tg E ; transmission is presumed to be the most variable process in 
the system, and other transitions are taken to be noise-free. The two other 
disease parameters, pei and hir, are treated as unknown constants. We sup- 
pose a constant mortality rate, psD = Ped = Pid = HRD = m, and here we 
fix m = 1/50 year -1 . The recruitment of school-age children is specified by 
the process Nss(t) = [Jq b(s — r) ds\ , where b(t) is the birth rate, presented 
in Figure 3, and |_a^J is the integer part of x. We note that the construction 
above does not perfectly match the constraint S(t) + E(t) + I(t) + R(t) = 
P(t). For a childhood disease, such as measles, a good estimate of the birth 
rate is important, whereas the system is insensitive to the exact size of the 
adult population. 

All transitions not mentioned above are taken to have a rate of zero. 
To complete the model specification, a measurement model is required. Bi- 
weekly aggregated measles cases are denoted by C n = Njn(t n ) — Nm(t n -i) 
with t n being the time, in years, of the nth observation. Reporting rates 
p n are taken to be independent Gamma(l/0, p<f>) random variables. Condi- 
tional on p n , the observations are modeled as independent Poisson counts, 
Yn\Pn,C n ~ Poisson (p n C n ). Thus, Y n given C n has a negative binomial dis- 
tribution with E[Y n \C n ] = pC n and Var(Y n |C n ) = pC n + (\>p 2 C\. Note that 
the measurement model counts transitions into R, since individuals are re- 
moved from the infective pool (treated with bed-rest) once diagnosed. The 
measurement model allows for the possibility of both demographic stochas- 
ticity (i.e., Poisson variability) and environmental stochasticity (i.e., gamma 
variability on the rates). 

A likelihood ratio test concludes that, in the context of this model, en- 
vironmental stochasticity is clearly required to explain the data: the log 



(5) (3(t) = 



P H : (t) = 7-99, 116-199, 252-299, 308-355, 
Pl : d(t) = 356-6, 100-115, 200-251, 300-307, 



(6) 



psE = P(W(t)+u} a /P(t)- 
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likelihood for the full model was found to be —2504.9, compared to —2662.0 
for the restricted model with use = (p < 10~ 6 , chi-square test; results 
based on a time-step of 5 = 1 day in the Euler scheme of Figure 1 and a 
Monte Carlo sample size of J = 20000 when carrying out the iterated filter- 
ing algorithm in Figure 2). Future model-based scientific investigations of 
disease dynamics should consider environmental stochasticity when basing 
scientific conclusions on the results of formal statistical tests. 

Environmental stochasticity, like over-dispersion in generalized linear mod- 
els, is more readily detected than scientifically explained. Teasing apart 
the extent to which environmental stochasticity is describing model mis- 
specification rather than random phenomena in the system is beyond the 
scope of the present paper. The inference framework developed here will 
facilitate both asking and answering such questions. However, this distinc- 
tion is not relevant to the central statistical question of whether a particular 
class of scientifically motivated models requires environmental stochasticity 
(in the broadest sense of a source of variability above and beyond demo- 
graphic stochasticity) to explain the data. Models of biological systems are 
necessarily simplifications of complex processes [May (2004)], and as such, it 
is a legitimate role for environmental stochasticity to represent and quantify 
the contributions of unknown and/or unmodeled processes to the system 
under investigation. 

The environmental stochasticity identified here has consequences for the 
qualitative understanding of measles epidemics. Bauch and Earn (2003) have 
pointed out that demographic stochasticity is not sufficient to explain the 
deviations which historically occurred from periodic epidemics (at one, two 
or three year cycles, depending on the population size and the birth-rate). 
Simulations from the fitted model with environmental stochasticity are able 
to reproduce such irregularities (results not shown), giving a simple expla- 
nation of this phenomenon. This does not rule out the possibility that some 
other explanation, such as explicitly introducing a new covariate into the 
model, could give an even better explanation. 

In agreement with Cauchemez and Ferguson (2008), we have found that 
some combinations of parameters in our model are only weakly identifiable 
(i.e., they are formally identifiable, but have broad confidence intervals). 
Although this does not invalidate the above likelihood ratio test, it does 
cause difficulties interpreting parameter estimates. In the face of this prob- 
lem, Cauchemez and Ferguson (2008) made additional modeling assump- 
tions to improve identifiability of unknown parameters. Here, our goal is to 
demonstrate our modeling and inference framework, rather than to present 
a comprehensive investigation of measles dynamics. 

The analysis of Cauchemez and Ferguson (2008), together with other con- 
tributions by the same authors [Cauchemez et al. (2008, 2006)], represents 
the state of the art for Markov chain Monte Carlo analysis of population 
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dynamics. Whereas Cauchemez and Ferguson (2008) required model-specific 
approximations and analytic calculations to carry out likelihood-based infer- 
ence via their Markov chain Monte Carlo approach, our analysis is a routine 
application of the general framework in Sections 2 and 3. Our methodol- 
ogy also goes beyond that of Cauchemez and Ferguson (2008) by allowing 
the consideration of environmental stochasticity, and the inclusion of the 
disease latent period (represented by the compartment E) which has been 
found relevant to the disease dynamics [Finkenstadt and Grenfell (2000), 
Bjornstad, Finkenstadt and Grenfell (2002), Morton and Finkenstadt (2005)]. 
Furthermore, our approach generalizes readily to more complex biological 
systems, as demonstrated by the following example. 

4.2. A mechanistic model for competing strains of cholera. All infec- 
tious pathogens have a variety of strains, and a good understanding of 
the strain structure can be key to understanding the epidemiology of the 
disease, understanding evolution of resistance to medication, and devel- 
oping effective vaccines and vaccination strategies [Grenfell et al. (2004)]. 
Previous analyses relating mathematical consequences of strain structure 
to disease data include studies of malaria [Gupta et al. (1994)], dengue 
[Ferguson, Anderson and Gupta (1999)], influenza [Ferguson, Galvani and Bush 
(2003), Koelle et al. (2006a)] and cholera [Koelle, Pascual and Yunus (2006b)]. 
For measles, the strain structure is considered to have negligible importance 
for the transmission dynamics [Conlan and Grenfell (2007)], another reason 
why measles epidemics form a relatively simple biological system. In this 
section, we demonstrate that our mechanistic modeling framework permits 
likelihood based inference for mechanistically motivated stochastic models 
of strain-structured disease systems, and that the results can lead to fresh 
scientific insights. 

There are many possible immunological consequences of the presence of 
multiple strains, but it is often the case that exposure of a host to one 
strain of a pathogen results in some degree of protection (immunity) from re- 
infection by that strain and, frequently, somewhat weaker protection (cross- 
immunity) from infection by other strains. Immunologically distinct strains 
are called serotypes. In the case of cholera, there are currently two common 
serotypes, Inaba and Ogawa. Koelle, Pascual and Yunus (2006b), following 
the multistrain modeling approach of Kamo and Sasaki (2002), constructed 
a mechanistic, deterministic model of cholera transmission and immunity to 
investigate the pattern of changes in serotype dominance observed in cholera 
case report data collected in an intensive surveillance program conducted by 
the International Center for Diarrheal Disease Research, Bangladesh. They 
argued on the basis of a comparison of the data with features of typical tra- 
jectories of the dynamical model. Specifically, Koelle, Pascual and Yunus 
(2006b) found that the model would exhibit behavior which approximately 
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Year 



Fig. 5. Biweekly cholera cases obtained from hospital records of the International Center 
for Diarrheal Disease Research, Bangladesh for the district of Matlab, Bangladesh, 55 km 
SE of Dhaka. Cases are categorized into serotypes, Inaba (dashed) and Ogawa (solid gray). 
Each serotype may be further classified into one of two biotypes, El Tor and Classical, which 
are combined here, following Koelle et al. (2006b). The total population size of the district, 
in thousands, is shown as a dotted line. 



matched the period of cycles in strain dominance only when the cross- 
immunity was high, that is, when the probability of cross-protection was 
approximately 0.95. In addition, they found that their model's behavior de- 
pended very sensitively on the cross-immunity parameter. Here we employ 
formal likelihood-based inference on the same data to assess the strength of 
the evidence in favor of these conclusions. 




Fig. 6. Flow diagram for cholera, including interactions between the two major serotypes. 
Each individual host falls in one compartment: S, susceptible to both Inaba and Ogawa 
serotypes; I\, infected with Inaba; I2, infected with Ogawa; Si, susceptible to Inaba (but 
immune to Ogawa); S2, susceptible to Ogawa (but immune to Inaba); II, infected with 
Inaba (but immune to Ogawa); 7|, infected with Ogawa (but immune to Inaba); R, immune 
to both serotypes. Births enter S, and all individuals have a mortality rate m. 
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We analyzed a time series consisting of 30 years of biweekly cholera inci- 
dence records (Figure 5). For each cholera case, the serotype of the infect- 
ing strain — Inaba or Ogawa — was determined. We formulated a stochastic 
version of the model analyzed by Koelle, Pascual and Yunus (2006b). The 
model is shown diagrammatically in Figure 6, in which arrows represent pos- 
sible transitions, each labeled with the corresponding rate of flow. Table 1 
specifies the model formally, in the framework of Section 2, as a Markov chain 
with stochastic rates. The parameters in the model have standard epidemio- 
logical interpretations [Anderson and May (1991), Finkenstadt and Grenfell 
(2000), Koelle and Pascual (2004)]: Ai is the force of infection for the In- 
aba serotype, that is, the mean rate at which susceptible individuals become 
infected; £i is the stochastic noise on this rate; A2 and £2 are the correspond- 
ing force of infection and noise for the Ogawa serotype; (3(t) is the rate of 
transmission between individuals, parameterized with a linear trend and a 
smooth seasonal component; uj gives the rate of infection from an environ- 
mental reservoir, independent of the current number of contagious individu- 
als; the exponent a allows for inhomogeneous mixing of the population; r is 
the recovery rate from infection; 7 measures the strength of cross-immunity 
between serotypes. In this model, as in Koelle, Pascual and Yunus (2006b), 
infection with a given serotype results in life-long immunity to reinfection 
by that serotype. The argument for giving both strains common variabil- 
ity is that they are believed to be biologically similar except in regard to 
immune response. The strains have independent noise components because 
the noise represents chance events, such as a contaminated feast or a single 
community water source which is transiently in a favorable condition for 
contamination, and such events spread whichever strain is in the required 
place at the required time. 

To complete the model specification, we adopt an extension of the neg- 
ative binomial measurement model used for measles in Section 4.1. Bi- 
weekly aggregated cases for Inaba and Ogawa strains are denoted by Ci >n = 
Nsi t (t n ) ~ Nsii (* n -l) + Ns^* (t n ) ~ N SiI * (t n -i) for i = 1, 2 respectively and 
t n = 1975 + n/24. Reporting rates p± in and p2,n are taken to be indepen- 
dent Gamma(l/^>, pcj>) random variables. Conditional on p\^ n and p2,m the 
observations are modeled as independent Poisson counts, 

Yi >n \p i>n , C i>n ~ Poisson(ft in C iin ), i = l,2. 

Thus, Yi n given Ci^n has a negative binomial distribution with. £/[l^ )Tl |Cj «,] — 
pC i>n and Var(Y iin |G\ n ) = pC i>n + 4>p 2 C} n . 

Some results from fitting the model in Figure 6 via the method in Fig- 
ure 2 are shown in Table 2. The two sets of parameter values 9 a and 9 b in 
Table 2 are maximum likelihood estimates, with 9 a having the additional 
constraints p = 0.067 and r = 38.4. These two constraints were imposed by 
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Table 1 

Interpretation of Figure 6 via the multinomial process with random rates in (2), with 
X(t) = (S(t), h(t), hit), Si{t), S 2 (t), It(t), Ii{t), R{t), B(t), D(t)). Compartments B 

and D are introduced for demographic considerations: births are formally treated as 
transitions from B to S and deaths as transitions into D. All transitions not listed above 
have zero rate, ^(t) ind are independent gamma noise processes, each with 

infinitesimal variance parameter a . Transition rates are noise-free unless specified 
otherwise. Seasonality is modeled via a periodic cubic B-spline basis {sj(t),i = 1, . . . ,6}, 
where Si(t) attains its maximum at t — (i — l)/6. The population size P(t) is shown in 
Figure 5. The birth process is treated as a covariate, that is, the analysis is carried out 
conditional on the process N B s(t) = [Pit) - P(0) + / mP(s) ds\, where [x\ is the 

integer part of x. There is a small stochastic discrepancy between 
Sit) + h it) + h it) + Si it) + S 2 it) + It it) + 1% it) + R(t) and Pit) . In principle, one 
could condition on the demographic data by including a population measurement 
model — we saw no compelling reason to add this extra complexity for the current 
purposes. Numerical solutions of sample paths were calculated using the algorithm in 

Figure 1, with 5 — 2/365 
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Koelle, Pascual and Yunus (2006b), based on previous literature. The fit- 
ted model with the additional constraints is qualitatively different from the 
unconstrained model, and we refer to the neighborhoods of these two pa- 
rameter sets as regimes A and B. Figure 7 shows a profile likelihood for 
cross-immunity in regime A; this likelihood-based analysis leads to substan- 
tially lower cross-immunity than the estimate of Koelle, Pascual and Yunus 
(2006b). In regime B, the cross-immunity is estimated as being complete 
(7 = 1), however, the corresponding standard error is large: cross-immunity 
is poorly identified in regime B since the much higher reporting rate (p = 
0.65) means that there are many fewer cases and so few individuals are ever 
exposed to both serotypes. 

These two regimes demonstrate two distinct uses of a statistical model — 
first, to investigate the consequences of a set of assumptions and, second, 
to challenge those assumptions. If we take for granted the published esti- 
mates of certain parameter values, the resulting parameter estimates 9a are 
broadly consistent with previous models for cholera dynamics in terms of 
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Table 2 

Parameter estimates from both regimes. In both regimes, the mortality rate m is fixed at 
1/38.8 years -1 . The units of r, bo and lj are year -1 ; a has units year 1 / 2 ; and p, 7, <f>, a 
and 61,..., &6 are dimensionless. I is the average of two log-likelihood evaluations using a 
particle filter with 120,000 particles. Optimization was carried out using the iterated 
filtering in Figure 2, with M = 30, a = 0.95 and J — 15,000. Optimization parameters 
were selected via diagnostic convergence plots [Ionides, Breto and King (2006)]. Standard 
errors (SEs) were derived via a Hessian approximation; this is relatively rapid to 
compute and gives a reasonable idea of the scale of uncertainty, but profile likelihood 
based confidence intervals are more appropriate for formal inference 
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the fraction of the population which has acquired immunity to cholera and 
the fraction of cases which are asymptomatic (a term applied to the ma- 
jority of unreported cases which are presumed to have negligible symptoms 
but are nevertheless infectious). However, the likelihood values in Table 2 
call into question the assumptions behind regime A, since the data are bet- 
ter explained by regime B (p < 10~ 6 , likelihood ratio test), for which the 
epidemiologically relevant cases are only the severe cases that are likely to 
result in hospitalization. Unlike in regime A, asymptomatic cholera cases 
play almost no role in regime B, since the reporting rate is an order of mag- 
nitude higher. The contrast between these regimes highlights a conceptual 
limitation of compartment models: in point of fact, disease severity and level 
of infectiousness are continuous, not discrete or binary as they must be in 
basic compartment models. For example, differences in the level of morbid- 
ity required to be classified as "infected" result in re-interpretation of the 
parameters of the model, with consequent changes to fundamental model 
characteristics such as the basic reproductive ratio of an infectious disease 
[Anderson and May (1991)]. Despite this limitation, it remains the case that 
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Fig. 7. Cross-immunity profile likelihood for regime A, yielding a 99% confidence inter- 
val for 7 of (0.20, 0.61) based on a \ 2 approximation [e.g., Barndorff-Nielsen and Cox 
(1994)]- Each point corresponds to an optimization carried out as described in the cap- 
tion of Table 2. Local quadratic regression implemented in R via loess with a span of 0. 6 
[R Development Core Team (2006)] was used to estimate the profile likelihood, following 
Ionides (2005). 



compartment models are a fundamental platform for current understanding 
of disease dynamics and so identification and comparison of different inter- 
pretations is an important exercise. 

The existence of regime B shows that there is room for improvement in the 
model by departing from the assumptions in regime A. Extending the model 
to include differing levels of severity might permit a combination of the data- 
matching properties of B with the scientific interpretation of A. Other as- 
pects of cholera epidemiology [Sack et al. (2004), Kaper, Morris and Levine 
(1995)], not included in the models considered here, might affect the conclu- 
sions. For example, although we have followed Koelle, Pascual and Yunus 
(2006b) by assuming lifelong immunity following exposure to cholera, in re- 
ality this protection is believed to wane over time [King et al. (2008b)]. It 
is plain, however, that any such model modification can be subsumed in 
the modeling framework presented here and that effective inference for such 
models is possible using the same techniques. 

5. Discussion. This paper has focused on compartment models, a flexi- 
ble class of models which provides a broad perspective on the general topic 
of mechanistic models. The developments of this paper are also relevant 
to other systems. For example, compartment models are closely related to 
chemodynamic models, in which a Markov process is used to represent the 
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quantities of several chemical species undergoing transformations by chemi- 
cal reactions. The discrete nature of molecular counts can play a role, partic- 
ularly for large biological molecules [Reinker, Altman and Timmer (2006), 
Boys, Wilkinson and Kirkwood (2008)]. Our approach to stochastic transi- 
tion rates (Section 2) could readily be extended to chemodynamic models, 
and would allow for the possibility of over-dispersion in experimental sys- 
tems. 

Given rates fUj, one interpretation of a compartment model is to write 
the flows as coupled ordinary differential equations (ODEs), 



Data analysis via ODE models has challenges in its own right [Ramsay et al. 
(2007)]. One can include stochasticity in (7) by adding a slowly varying 
function to the derivative [Swishchuk and Wu (2003)]. Alternatively, one 
can add Gaussian white noise to give a set of coupled stochastic differential 
equations (SDEs) [e.g., Oksendal (1998)]. For example, if {Wij^it)} is a 
collection of independent standard Brownian motion processes, and (Jijk = 
(Jijk(t,X(t)), an SDE interpretation of a compartment model is given by 



SDEs have some favorable properties for mechanistic modeling, such as 
the ease with which stochastic models can be written down and inter- 
preted in terms of infinitesimal mean and variance [Ionides et al. (2004), 
Ionides, Breto and King (2006)]. However, there are several reasons to pre- 
fer integer-valued stochastic processes over SDEs for modeling population 
processes. Populations consist of discrete individuals, and, when a popula- 
tion becomes small, that discreteness can become important. For infectious 
diseases, there may be temporary extinctions, or "fade-outs," of the disease 
in a population or sub-population. Even if the SDE is an acceptable ap- 
proximation to the disease dynamics, there are technical reasons to prefer 
a discrete model. Standard methods allow exact simulation for continuous 
time Markov chains [Bremaud (1999), Gillespie (1977)], whereas for a non- 
linear SDE this is at best difficult [Beskos et al. (2006)]. In addition, if an 
approximate Euler solution for a compartment model is required, nonneg- 
ativity constraints can more readily be accommodated for Markov chain 
models, particularly when the model is specified by a limit of multinomial 
approximations, as in (2). The most basic discrete population compartment 
model is the Poisson system [Bremaud (1999)], defined here by 



(7) 



— Nij = HijXi(t). 




k 



(8) 



P[AN ij = n ij \X{t) = (x 1 ,...,x c )] 

= I I (/';,-'v , > )"••' (1 - IMjXiS) + o(5). 



i jjii 



24 



C. BRETO, D. HE, E. L. IONIDES AND A. A. KING 



The Poisson system is a Markov chain whose transitions consist of single 
individuals moving between compartments, that is, the infinitesimal proba- 
bility is negligible of either simultaneous transitions between different pairs 
of compartments or multiple transitions between a given pair of compart- 
ments. As a consequence of this, the Poisson system is "equidispersed," 
meaning that the infinitesimal mean of the increments equals the infinites- 
imal variance (Appendix B). Overdispersion is routinely observed in data 
[McCullagh and Nelder (1989)], and this leads us to consider models such as 
(2) for which the infinitesimal variance can exceed the infinitesimal mean. 
For infinitesimally over-dispersed systems, instantaneous transitions of more 
than one individual are possible. This may be scientifically plausible: a 
cholera-infected meal or water-jug may lead to several essentially simul- 
taneous cases; many people could be simultaneously exposed to an influenza 
patient on a crowded bus. Quite aside from this, if one wishes for what- 
ever reason to write down an over-dispersed Markov model, the inclusion 
of such possibilities is unavoidable. Simultaneity in the limiting continuous 
time model can alternatively be justified by arguing that the model only 
claims to capture macroscopic behavior over sufficiently long time intervals. 

Note that the multinomial distribution used in (2) could be replaced by 
alternatives, such as Poisson or negative binomial. These alternatives are 
more natural for unbounded processes, such as birth processes. For equidis- 
persed processes, that is, without adding white noise to the rates, the limit 
in (2) is the same if the multinomial is replaced by Poisson or negative bi- 
nomial. For overdispersed processes, these limits differ. In particular, the 
Poisson gamma and negative binomial gamma limits have unbounded jump 
distributions and so are less readily applicable to finite populations. 

The approach in (2) of adding white noise to the transition rates differs 
from previous approaches of making the rates a slowly varying random func- 
tion of time, that is, adding low frequency "red noise" to the rates. There 
are several motivations for introducing models based on white noise. Most 
simply, adding white noise can lead to more parsimonious parameterization, 
since the intensity but not the spectral shape of the noise needs to be con- 
sidered. The Markov property of white noise is inherited by the dynamical 
system, allowing the application of the extensive theory of Markov chains. 
White noise can also be used as a building block for constructing colored 
noise, for example, by employing an autoregressive model for the param- 
eters. At least for the specific examples of measles and cholera studied in 
Section 4, high-frequency variability in the rate of infection helps to explain 
the data (this was explicitly tested for measles; for our cholera models, Ta- 
ble 2 shows that the estimates of the environmental stochasticity parameter 
a are many standard errors from zero). Although variability in rates will not 
always be best modeled using white noise, there are many circumstances in 
which it is useful to be able to do so. 
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This article has taken a likelihood-based, non-Bayesian approach to sta- 
tistical inference. Many of the references cited follow the Bayesian paradigm. 
The examples of Section 4 and other recent work [Ionides, Breto and King 
(2006), King et al. (2008b)] show that iterated filtering methods enable rou- 
tine likelihood-based inference in some situations that have been challenging 
for Bayesian methodology. Bayesian and non-Bayesian analyses will continue 
to provide complementary approaches to inference for time series analysis 
via mechanistic models, as in other areas of statistics. 

Time series analysis is, by tradition, data oriented, and so the quantity 
and quality of available data may limit the questions that the data can rea- 
sonably answer. This forces a limit on the number of parameters that can be 
estimated for a model. Thus, a time series model termed mechanistic might 
be a simplification of a more complex model which more fully describes 
reductionist scientific understanding of the dynamical system. As one exam- 
ple, one could certainly argue for including age structure or other population 
inhomogeneities into Figure 6. Indeed, determining which additional model 
components lead to important improvement in the statistical description of 
the observed process is a key data analysis issue. 

APPENDIX A: THEOREMS CONCERNING COMPARTMENT 
MODELS WITH STOCHASTIC RATES 

Theorem A.l. Assume (P1)-(P5) and suppose that mj(t,x) is uni- 
formly continuous as a function of t. Suppose initial values X(0) = (Ai(0), 
. . . ,X C (0)) are given and denote the total number of individuals in the popu- 
lation by S = J2i Ai(0). Label the individuals 1, . . . , S and the compartments 
1, ...,c. Let C(£,0) be the compartment containing individual £ at time 
t = 0. Set T£ 5 o = 0, and generate independent Exponential 1) random vari- 
ables M^oj for each ( and j ^ C(C,0). We will define T^ >m j, r^ jm , C(C,m), 
and M^^ m j recursively for m > 1. Set 

(9) T Cjm j =inf|t: f n C (c,m-i),j(s,X(s))dTc(C,m-l),j{s) > M C)TO _iA. 

At time r^ jm = min, T^ tm j, set C((,m) = arg muij r^ jTn> j and for each j ^= 
C(£,m) generate an independent Exponential(l) random variable M^ )Tn j. 
Set 

dN l3 (t) = ]T I{C(C, m - 1) = i, C(C, m) = j, r c , m = i}, 

where !{•} is an indicator function, and set Xi(t) = -Xj(O) + f^Y^j^iidNji — 
dNij). X(t) is a Markov chain whose infinitesimal transition probabilities 
satisfy (2). 
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Note A.l. The random variables {M^ m j} in Theorem A.l are termed 
transition clocks, with the intuition that X(t) jumps when one (or more) 
of the integrated transition rates in (9) exceeds the value of its clock. In a 
more basic construction of a Markov chain, one re-starts the clocks for each 
individual whenever X(t) makes a transition. The memoryless property of 
the exponential distribution makes this equivalent to the construction of 
Theorem A.l, where clocks are restarted only for individuals who make a 
transition [Sellke (1983)]. Sellke's construction is convenient for the proof of 
Theorem A.l. 

Note A. 2. The trajectories of the individuals are coupled through the 
dependence of /j,(t,X(t)) on X(t), and through the noise processes which 
are shared for all individuals in a given compartment and may be dependent 
between compartments. Thus, to evaluate (9), it is necessary to keep track 
of all individuals simultaneously. To check that the integral in (9) is well 
defined, we note that X(s) depends only on {{T^ m , C(C, no)) : T^ :Tn <s,( = 
1, . . . , S, m = 1, 2, . . .}. X(s) is thus a function of events occurring by time s, 
so it is legitimate to use X(s) when constructing events that occur at t > s. 

Theorem A. 2. Supposing (P1)-(P7), the infinitesimal transition prob- 
abilities given by (2) are 

P[ANij = riijjor all X(t) = (rci,.. .,x c )\ 

= nn^inij^iifiij^ij) +o(6), 

i jj^i 

where 

7r(n, x, fj,, a) 

(!0) 

= l {n=0} + sfy £ (£) (-lf-fc+V- 2 ln(l + o*rtx - A:)). 

The full independence of {r^} assumed in Theorem A. 2 gives a form 
for the limiting probabilities where multiple individuals can move simulta- 
neously between some pair of compartments i and j, but no simultaneous 
transitions occur between different compartments. In more generality, the 
limiting probabilities do not have this simple structure. In the setup for 
Theorem A.l, where r^- is independent of for j ^ k, no simultaneous 
transitions occur out of some compartment i into different compartments 
j ^k, but simultaneous transitions from i to j and from i' to j' cannot be 
ruled out for i^i' . The assumption in Theorem A.l that Tij is independent 
of Tj/u for j ^ k is not necessary for the construction of a process via (2), 
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but simplifies the subsequent analysis. Without this assumption, a construc- 
tion similar to Theorem A.l would have to specify a rule for what happens 
when an individual who has two simultaneous event times, that is, when 
minj TQ^ m j is not uniquely attained. Although independence assumptions 
are useful for analytical results, a major purpose of the formulation in (2) is 
to allow the practical use of models that surpass currently available math- 
ematical analysis. In particular, it may be natural for different transition 
processes to share the same noise process, if they correspond to transitions 
between similar pairs of states. 

APPENDIX B: EQUIDISPERSION OF POISSON SYSTEMS 
For the Poisson system in (8), 

(11) P[ANij = 0\X(t) = x] = 1 - fiijXid + o(5), 

(12) P[ANij = l\X(t) = x] = fUjXid + o(S), 

where x = (x±, . . . , x c ) and fiij = fJ-ij(t, x). Since the state space of X(t) is fi- 
nite, it is not a major restriction to suppose that there is some uniform bound 
fiij(t,x)xi < u, and that the terms o(5) in (11), (12) are uniform in x and 
t. Then, P[ANij>k\X(t)]<F(k,6v), where F(k,\) = Y J f=k+i^ j e~ x / j\- It 
follows that 

oo 

E[ANij\X(t) =x] = J2 Pi&Nij > k\X{t) =x]= iHjXiS + o(5), 

k=0 

oo 

E[(AN l:j ) 2 \X(t) =x} = J2( 2k + tfPiANij > = x ] = Vijxrf + o(S), 

k=0 

and so Var(ANij\X(t) = x) = HijXiS + o(5). If the rate functions Hij(t, x) are 
themselves stochastic, with X(t) being a conditional Markov chain given 
{/% ) 1 < « < c, 1 < j < c}, a similar calculation applies so long as a uniform 
bound v still exists. In this case, 

(13) E[AN ij \X(t)=x] = SE[ f x ij (t,x)x i ] + o(S), 

(14) VsLT(AN ij \X(t)=x) = dE[n ij (t,x)x i ]+o(5). 

The necessity of the uniform bound v is demonstrated by the inconsistency 
between (13), (14) and the result in Theorem A. 2 for the addition of white 
noise to the rates. 
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SUPPLEMENTARY MATERIAL 

Theorems concerning compartment models with stochastic rates (DOI: 
10.1214/08-AOAS201SUPP; .pdf). We present proofs of Theorems A.l and A.2, 
which were stated in Appendix A. 
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